An efficient high-probability algorithm for Linear Bandits
نویسندگان
چکیده
For the linear bandit problem, we extend the analysis of algorithm CombEXP from Combes et al. [2015] to the high-probability case against adaptive adversaries, allowing actions to come from an arbitrary polytope. We prove a high-probability regret of O(T2/3) for time horizon T. While this bound is weaker than the optimal O( √ T) bound achieved by GeometricHedge in Bartlett et al. [2008], CombEXP is computationally efficient, requiring only an efficient linear optimization oracle over the convex hull of the actions.
منابع مشابه
CBRAP: Contextual Bandits with RAndom Projection
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual d...
متن کاملThompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...
متن کاملParametric Bandits: The Generalized Linear Case
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive finite time, high probability bounds on the regret of the algorithm, extending previous analyses developed for the linear bandits to the non-linear case. The analysis highlights a key difficulty in genera...
متن کاملG : Bandits , Experts and Games 09 / 12 / 16 Lecture 4 : Lower Bounds ( ending ) ; Thompson Sampling
Here is a parameter to be adjusted in the analysis. Recall that K is the number of arms. We considered a “bandits with predictions” problem, and proved that it is impossible to make an accurate prediction with high probability if the time horizon is too small, regardless of what bandit algorithm we use to explore and make the prediction. In fact, we proved it for at least a third of problem ins...
متن کاملAn Efficient Method for Selecting a Reliable Path under Uncertainty Conditions
In a network that has the potential to block some paths, choosing a reliable path, so that its survival probability is high, is an important and practical issue. The importance of this issue is very considerable in critical situations such as natural disasters, floods and earthquakes. In the case of the reliable path, survival or blocking of each arc on a network in critical situations is an un...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.02072 شماره
صفحات -
تاریخ انتشار 2016